Back

Genetics Selection Evolution

Springer Science and Business Media LLC

Preprints posted in the last 90 days, ranked by how well they match Genetics Selection Evolution's content profile, based on 33 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Use of a Sire MGS model to disentangle paternal and maternal origins of genetic variance in lifetime productivity of tropical dairy cattle.

Menendez-Buxadera, A.

2026-03-12 genetics 10.64898/2026.03.09.710651 medRxiv
Top 0.1%
20.0%
Show abstract

Data from 80,713 first-calving cows (1984 to1989) of the Holstein, Mambi, and Siboney breeds, belonging to seven large dairy enterprises in Cuba and progenies of 1,297 sires, were analyzed. For each cow, the average across all lactations for at least 14 years after first calving was defined as individual productivity (PI), and the corresponding lifetime sum as accumulated productivity (PA); both traits were. Two genetic models were fitted: a classical Animal Model (M1) and a Sire maternal grandsire model (Sire MGS; M2), aimed at partitioning additive genetic variance into paternal and maternal-line components. Heritability estimates under model M1 were moderate (h2 {approx} 0.135 to 0.140), whereas M2 yielded higher values (h2 {approx} 0.158 to 0.170), reflecting increased additive variance due to a better connectedness across herds. Using estimated breeding values (EBV) for PI and PA, a global cow merit index (H1) was defined under M1. Under M2, a parental index (IM2) combining four standardized predictors (paternal and maternal-grandsire EBV for PI and PA) was constructed. Multiple regression of H1 on IM2 showed that the paternal and maternal-grandsire paths accounted for 73% and 27% of the variation, respectively, indicating a non-negligible maternal-line contribution. Model M2 provided the best overall fit according to information criteria and cross validation using two independent subsamples and the full population yielded correlations of 0.870 to 0.881, demonstrating strong predictive ability and stability of IM2 rankings. These results support the Sire MGS model as a structural extension of the Animal Model for breeding programs targeting lifetime productivity in tropical dairy cattle.

2
Genetic and heat-stress related environmental influences on pig whole-blood gene expression levels

Durante, A.; Feve, K.; Naylies, C.; Labrune, Y.; Gress, L.; Lippi, Y.; Legoueix, S.; Milan, D.; Gourdine, J.-L.; Gilbert, H.; Renaudeau, D.; Riquet, J.; Devailly, G.

2026-03-18 genomics 10.64898/2026.03.17.712411 medRxiv
Top 0.1%
10.4%
Show abstract

BackgroundGene expression levels are affected by genetics and environmental effects. However, quantification of the influence of genetics and environmental effects on gene expression remains limited, especially in farm animals. Here, the relative influence of genetic and heat-related environmental variations on gene expression levels was investigated in pigs, using a backcross herd of diverse heat adaptation levels. Backcross animals were raised in either a tropical or temperate environment. Animals raised in temperate environment were subjected to an experimental heat stress at the end of their growth. ResultsWe identified 1,967 differentially expressed genes (DEGs) between pigs raised in the tropical (n = 181) and temperate (n = 180) facilities, and 472 DEGs throughout a 3 weeks experimental heat stress. Transcriptome-wide association (TWAS) study identified 139 associations between gene expression levels and thermoregulation/production traits. We detected 6,014 expression quantitative trait loci (eQTLs) associated with the expression level of 3,297 genes. Genetic variance was estimated to explain 36.3% of gene expression variance on average, and was the main source of variance for 27.7% of transcripts. Most eQTLs found are located in proximal regions (cis-eQTLs) and few within distal regions (trans-eQTLs) to their assigned genes. A trans-eQTL hotspot highlighted a hematopoietic mechanism driven by GPATCH8. An integration of GWAS and TWAS pointed to TMCO1 and ZNF184 as candidate genes for backfat thickness. ConclusionsThis study provides a better understanding of the impact of climate, heat stress and genetic influences on the pig whole blood transcriptome.

3
Accurate estimation of canine inbreeding using ultra low-coverage whole genomesequencing

Pellegrini, M.; Kim, R.; Rubbi, L.; Kislik, G.; Smith, D.

2026-04-07 bioinformatics 10.64898/2026.04.04.716453 medRxiv
Top 0.1%
10.1%
Show abstract

The measurement of inbreeding has gained significance across diverse fields, including population and conservation genetics, agricultural genetics, breeding programs for animals and plants, and wildlife management. This is due to the fact that inbreeding leads to increased homozygosity and results in lower genetic diversity, rendering populations more vulnerable to environmental changes, diseases, and other stressors. High or mid-coverage whole genome sequencing (WGS) has been widely used for inbreeding estimation, but it is resource-intensive. We aimed to investigate the use of ultra low-coverage whole genome sequencing (ulcWGS) as a cost-effective alternative for inbreeding analysis. Domestic dogs were used for our study as their extensive breeding histories lead to populations with a wide range of inbreeding levels. We constructed a multi-breed reference panel from high-coverage WGS samples. Inbreeding in independent ulcWGS samples was then estimated using runs of homozygosity (RoH) and inbreeding coefficients (F). We modeled the relationship between these measures and sequencing depth using nonlinear regression, to generate inbreeding estimates relative to sequencing depth. Resulting relative RoH and F measurements were significantly correlated, with purebred dogs exhibiting more runs of homozygosity and higher inbreeding coefficients compared to mixed-breed dogs. Our findings demonstrate that ulcWGS can provide reliable and economical estimations of inbreeding, expanding accessibility to genetic monitoring.

4
Detection and evaluation of copy number variation using both linked-read and short-read sequencing in New Zealand dairy cattle

Wang, Y.; Nugroho, T.; Johnson, T. J. J.; Couldrey, C.; Harris, B. L.

2026-04-23 bioinformatics 10.64898/2026.04.20.718595 medRxiv
Top 0.1%
10.1%
Show abstract

In recent years, genetic studies have made significant progress in identifying single-nucleotide polymorphisms (SNPs) associated with cattle health and production traits. However, it is still challenging to identify and validate more complicated forms of variation, such as copy number variation (CNV) and other types of structural variation (SV). In this study, SV regions were identified using 37 New Zealand dairy cattle with linked-read sequence data. A transmission-based framework was used to validate these variants at the population scale. 62,438 putative autosomal SV regions were identified with the LongRanger pipeline following the 10x Genomics recommendations. Copy number states for these regions were subsequently estimated via a read-depth based genotyping method using CNVpytor in a population-representative cohort of 2306 animals using Illumina short-read sequencing technology. Mendelian inheritance of copy number states was assessed using linear mixed models incorporating pedigree information, and transmission levels were used to quantify the biological validity of each CNV region. Transmission levels ranged widely, with a mean of 0.5162 across all regions, where higher transmission levels were proportionally enriched for larger SVs. A total of 7218 CNV regions exhibited high transmission levels (>0.9), indicating strong evidence of inheritance. Among these, 7136 overlapped CNV regions reported in one or more public datasets, while 82 high-confidence regions represent previously unreported variants. High-transmission CNV regions tended to show clear, discrete inheritance patterns in trio families, providing the biological evidence that these CNVs are inherited within the population. Together, these results demonstrate that integrating linked-read sequencing with population-scale transmission-based validation provides a robust framework for identifying high-confidence CNV regions. This catalogue of validated CNV regions represents an important resource for downstream functional analyses and the incorporation of structural variation into genomic selection and breeding programs.

5
Exploring genetic, expression and regulatory patterns of parental alleles in Muscovy duck (Cairina moschata) using haplotype-resolved assemblies

Li, T.; Wang, y.; Zhang, Z.; Chen, c.; Zheng, n.; Wang, j.; Ning, m.; Wang, j.; Ai, H.; Huang, Y.

2026-03-07 genomics 10.64898/2026.03.04.709678 medRxiv
Top 0.1%
6.7%
Show abstract

BackgroundAlthough the biological mechanism for heterosis has been debated for a long time, heterosis is widely utilized to increase the global productivity of crops and livestock. Recently, the mechanism has been well characterized in crops and livestock with a male-heterogametic XY system due to genomic assembly advancements, especially the availability of haploid genomes. However, the biological mechanism for heterosis remains unclear in poultry possessing the female-heterogametic ZW system. ResultsHere, we assembled chromosome-level diploid and haploid genomes of the Muscovy duck. We developed an efficient and cost-effective method to assemble 12 variation graph-haploid Muscovy duck genomes from three full-sibling pairs with high quality using short-read Illumina sequences. We further characterized genetic, expression and regulatory patterns of parental alleles at multiple scales. We found that maternal haploid genomes generally had more open chromatin organization and higher accessibility, and higher levels of gene expression, while showing similar DNA methylation levels when compared to paternal haploid genomes. In contrast, the female paternal Z chromosome showed the most, and the male paternal Z chromosome presented more, relaxed chromatin organization and chromatin accessibility, and gene expression compared to the male maternal Z chromosome. Thus, the ZW system largely relies on compensation and balance to regulate gene expression on the sex Z chromosome. Moreover, we identified non-Mendelian regions covering 0.26% of the genome ([~]3.18 Mb). These regions contained lower gene density, GC content, and repeat sequence frequency, but were enriched for DNA motifs bound by transcription factors, likely leading to a compacted chromatin structure and lower chromatin accessibility. ConclusionsOur work here provides a comprehensive profile of parental alleles genetic, expression and regulatory patterns in the female-heterogametic ZW system, and might be useful for the utilization of heterosis in poultry.

6
The genetic architecture of milk urea concentration in dairy cattle differs across the lactation cycle

He, Q.; Vasiljevic, S.; Kadri, N.; Watson, N.; Stratz, P.; Mapel, X. m.; Leonard, A. S.; seefried, F. R.; Pausch, H.

2026-04-24 genomics 10.64898/2026.04.22.719978 medRxiv
Top 0.1%
6.3%
Show abstract

Milk urea concentration (MUC) is an indicator of dietary protein utilization and nitrogen use efficiency in dairy cows. We performed genome-wide association studies (GWAS) on MUC in early, mid, and late lactation in the Holstein (HOL) and Brown Swiss (BSW) dairy cattle breeds using imputed sequence variants. We identified 11 and 17 independent quantitative trait loci (QTL) for MUC across the three lactation stages in BSW and HOL, respectively. While many of these QTL have previously been reported for MUC and other dairy traits, our study provides evidence that some QTL exert lactation-stage specific effects. Our findings suggest that variants at the DGAT1 locus on BTA14 have pleiotropic effects on MUC and other dairy traits. This QTL showed an early lactation-specific association with MUC but impacted milk and fat yield across the entire lactation. We fine-mapped two QTL for MUC in early and mid-lactation in BSW on BTA9 (lead SNP: 9:21392941, Pcorrected = 1.1E-17) and BTA28 (lead SNP: 28:6518357; Pcorrected = 3E-11). We identified lncRNA ENSBTAG00000058688 and IBTK as positional and functional candidate genes for the BTA9 QTL, and KCNK1 as positional and functional candidate gene that harbors a highly significant missense variant for the BTA28 QTL. In conclusion, our results shed light on the genetic architecture of MUC and highlighted QTL harboring potential functional variants underpinning milk urea variation within and across breeds.

7
kinference: Pairwise kinship detection for Close-Kin Mark-Recapture

Bravington, M. V.; Baylis, S. M.; Eveson, P.; Feutry, P.

2026-05-21 genetics 10.64898/2026.05.18.725841 medRxiv
Top 0.1%
4.8%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWClose-Kin Mark-Recapture (CKMR) is a statistical framework for estimating demographic parameters of wild populations. Instead of recapturing individuals, it relies on the identification of closely-related pairs such as parents and offspring, or siblings. By measuring how often such close-kin are "recaptured" among sampled animals (whether alive or dead), scientists can estimate demographic parameters such as census size, mortality rates, and connectivity. CKMR is starting to change fisheries and wildlife management by giving more reliable demographic information, even for many species that resist conventional approaches. Here we introduce the kinference R package, which provides a set of tools for finding close-kin pairs among thousands of samples each genotyped at thousands of SNPs, and for associated quality control. The CKMR context implies different requirements and assumptions to many other kinship programs. In particular, kinference accounts empirically for linkage without requiring a genome assembly, is able to estimate and control false-negative and false-positive probabilities, and can cope with null alleles. The package has been developed and used in numerous CKMR projects since 2017. This paper documents the assumptions, statistical algorithms, and intended workflow for kinference.

8
Temporal changes in allele frequency facilitate detection of adaptive variants in winter wheat (Triticum aestivum L.) breeding programs

Johansen, N. H.; Sarup, P.; Hansen, P.; Orabi, J.; Jahoor, A.; Ramstein, G. P.

2026-05-04 genetics 10.64898/2026.04.30.721918 medRxiv
Top 0.1%
4.4%
Show abstract

In quantitative genetics, candidate SNPs are identified through genotype-phenotype associations inferred with genome-wide association studies (GWAS). In this study, we explore an alternative approach to detect genetic variants with non-neutral effects by tracking temporal trends in allele frequency in a winter wheat (Triticum aestivum L.) breeding population over an eight-year period, from which signals of selection may be inferred. Selection signatures were inferred with a generalized linear model, where we modeled trends in allele frequency as a function of time (crossing year). These signatures of selection were used to prioritize variants. Associations between phenotypic performance and individual load of prioritized variants were then investigated. Furthermore, we assessed whether incorporating selection information into a genomic best linear unbiased prediction (GBLUP) model improves model performance in terms of quality of fit and prediction ability. Our findings indicate that the inferred signals of selection are effective in identifying non-neutral variants. Variants under strong negative selection were associated with a decrease in protein content adjusted for grain yield (p-value < 0.01), while genetic variants that had been under moderate to high levels of positive selection were associated with increased grain yield (p-value < 0.01). However, incorporating selection information did not improve prediction accuracy. In conclusion, temporal trends in allele frequency can be used to detect non-neutral variants. The proposed approach may hence complement traditional quantitative genetic methods for detecting non-neutral genetic variation. This approach may allow breeders to detect non-neutral variants earlier in the breeding cycle, without resorting to phenotypic data.

9
Genetic parameters and genotype-by-diet interactions forgrowth traits in Australian black soldier fly larvae: Implicationsfor selective breeding in the circular bioeconomy

Gowda, K. B.; Septriani, S.; Jones, D. B.; Jerry, D. R.; Tedder, C.; Zenger, K. R.

2026-03-17 genetics 10.64898/2026.03.14.711759 medRxiv
Top 0.1%
3.2%
Show abstract

BackgroundBlack soldier fly larvae (Hermetia illucens, BSFL) efficiently bio-convert organic waste into high-value protein, which has significant potential in domesticated animal feed formulations. BSFL growth and bioconversion potential can be enhanced through selective breeding, which requires accurate estimates of genetic parameters and knowledge of genotype-by-diet (G x D) interactions. However, comprehensive knowledge of G x D interactions is limited, and reports of genetic parameters are sparse across genetic strains and production environments globally. ResultsThis study estimated heritabilities, dominance effects and genetic correlations for BSFL growth traits and quantified G x D interactions. Phenotypes of 2,097 fifth-instar larvae reared on three diets were recorded, including larval body weight (LBW), length (LL), width (LW), and surface area (LSA). All larvae were genotyped using a custom 6K Allegro SNP panel. Genetic parameters and G x D interactions were estimated by fitting an additive-dominance model in ASReml-R. Heritabilities for growth traits were low across diets (0.05-0.14), with diet-specific estimates ranging from low to moderate (0.06-0.36). Dominance effects were significant across the traits (0.09-0.19), and genetic correlations were high among growth traits (>0.81), except between LW and LL (0.51). G x D interactions were moderate across diets (-0.04-0.49). ConclusionResults suggest that moderate to high genetic gain is achievable over a long-term breeding programme, given the genetic basis of growth traits and BSFs short generation interval (38-45 days). However, G x D interactions must be considered, either through combined or diet-specific selection strategies, and the significant dominance effects suggest heterosis could accelerate improvement.

10
A telomere-to-telomere (T2T) pig genome assembly reveals Y chromosome diversity and structural variations of Wuzhishan pigs

Ren, Y.; Wang, F.; Li, X.; Liu, G.; Sun, R.; Zheng, X.; Zhang, Y.; Lin, R.; Lu, X.; Chen, L.; Xin, W.; Fei, Y.; Chao, Z.

2026-04-27 genomics 10.64898/2026.04.23.720499 medRxiv
Top 0.1%
1.5%
Show abstract

BackgroudWuzhishan (WZS) pigs are native to Hainan Province of China, and serve as both important agricultural resources and biomedical models. Although the published WZS pig genome (T2T-pig1.0) even achieving telomere-to telomere (T2T) completeness, substantial genetic diversity still exists within the same pig breed, another WZS pig genome named WZS-T2T was assembled in this study. ResultsMultiple sequencing data were used to assemble genome, and finally yielded a [~]2.68 Gb telomere-to-telomere genome, with N50 length [~]142.87 Mb, and annotated protein coding genes of 23,100. Compared to T2T-pig1.0, QV and BUSCO value was higher, and the Y chromosome (ChrY) length was longer in WZS-T2T than that of T2T-pig1.0. ChrY of two WZS pigs shared 11 genes, including sex differentiation-related genes of SHOX, PRKX, and DDX3X, and SRY; however, energy metabolism gene SLC25A4 and the macrophage-related receptor gene CSF2RA of ChrY were specific to WZS-T2T. An inversion SV on chromosome 10 with length [~]33.86 Mb was identified between two WZS pigs, and three proofs were proposed for proving the accuracy sequence orientation of WZS-T2T.The genetic diversity was consistent with LD decay speed in population different analysis. WZS pigs exhibited higher genetic diversity than other four pig populations (Tunchang pigs, Yuxi black pigs, Large White pig, and Duroc pigs) examined in this study, and presented slower LD decay compared to other four breeds. ConclusionsTherefore, WZS-T2T provided a higher-quality assembly, and potential advantages of both agricultural production and biomedical targets for WZS pigs.

11
Uncertainty-aware breeding decisions: MCMC-based optimum contribution selection increases breeding decision robustness

Ahlinder, J.; Waldmann, P.

2026-03-18 genetics 10.64898/2026.03.15.711440 medRxiv
Top 0.1%
1.3%
Show abstract

Current optimum contribution selection (OCS) implementations use point estimates of estimated breeding values (EBVs), potentially leading to suboptimal selections when individuals have uncertain genetic evaluations. We developed a framework assessing how EBV uncertainty affects OCS decisions through MCMC-based approaches using the COSMO optimizer in Julia, evaluated on Norway spruce (Picea abies, n=5,525) and Loblolly pine (Pinus taeda, n=926) populations. Agreement between point estimate (MAP-OCS) and MCMC-OCS was surprisingly low: mean overlap of only 26.6 (4.8) individuals in Norway spruce genotyped subpopulation and 14.1 (3.6) in full pedigree, with Loblolly pine intermediate at 16.0 (9.6). Despite this low individual-level agreement, selection frequency across MCMC iterations corresponded well with EBV rankings (Spearman{rho} = 0.782 for Norway spruce), confirming that higher-EBV individuals were preferentially selected under posterior uncertainty. To comprehensively quantify uncertainty impacts, we employed two complementary metrics: individual robustness scores measuring genetic gain stability upon candidate removal, and population-level contribution distribution metrics capturing concentration of genetic gain across selected individuals. Applying these metrics identified 25 high-risk individuals in Norway spruce and nine in Loblolly pine, and constrained exclusion of these individuals improved individual robustness by 16.5% in Loblolly pine (3.00% genetic gain loss) and 29.8% in Norway spruce (2.14% genetic gain loss). Our uncertainty-aware OCS framework successfully identifies unstable selections that may compromise long-term genetic gain, and we recommend assessing EBV uncertainty through posterior distributions and evaluating population-specific trade-offs when implementing uncertainty-aware selection strategies.

12
Bayesian AMMI-Based Simulation of Genotype x Environment Interactions

Lee, H.; Segae, V. S.; Garcia-Abadillo, J.; de Oliveira Bussiman, F.; Trujano Chavez, M. Z.; Hidalgo, J.; Jarquin, D.

2026-03-15 bioinformatics 10.64898/2026.03.11.711188 medRxiv
Top 0.1%
1.2%
Show abstract

Genotype-by-environment interaction (GEI) has been studied to identify environment-stable/favorable genotypes. The GEI simulation could help refine the inference by incorporating tangible factors such as genomic and environmental information. The Bayesian additive main effect and multiplicative interaction (Bayesian AMMI) model captures the genotype-specific responses across environments, reflecting directional relationships between genotypes and environments. Thus, we propose a Bayesian AMMI-based GEI simulation framework that utilizes high-throughput environmental covariance matrices to generate GEI effects with interpretable directional structure. To demonstrate the proposed approach, two simulated phenotypes were assessed under four levels of GEI variance. In the first simulation (Sim1), GEI effects were sampled from a multivariate normal distribution defined by the GEI matrix. In the second simulation (Sim2), GEI effects were generated by extending Sim1 with the Bayesian AMMI model. In both simulations, increasing GEI variance resulted in lower correlations of phenotypes across environments and stronger genotype-specific sensitivity to environmental variation. Across five cross-validation designs, models accounting for GEI consistently outperformed one that did not, with prediction accuracy generally decreasing as GEI variance increased. Clear distinctions between the two simulated phenotypes were evident from biplot analyses: Sim2 successfully captured environmental relatedness and genotype-specific responses, whereas such structure was absent in Sim1. These results demonstrate that the proposed Bayesian AMMI-based GEI simulation framework enables interpretable visualization of GEI and supports genomic selection strategies under complex environmental conditions.

13
Genetic population structure and demographic history of Pacific cod in Japanese waters: Implications for stock identification using SNP markers

Hirao, A. S.; Sakuma, K.; Akita, T.; Chiba, S. N.

2026-03-13 genetics 10.64898/2026.03.11.710969 medRxiv
Top 0.1%
1.2%
Show abstract

Pacific cod is a key species in North Pacific fisheries, and its stock assessment and management units are separated according to biological, geographical, and administrative information. Understanding the fine-scale genetic population structure of this species is crucial for effective management, particularly in regions such as Japan, where complex coastal geography and localised fisheries management prevail. Therefore, in this study, we analysed genome-wide single nucleotide polymorphisms (SNPs; 6,035 loci) in 496 individuals of Pacific cod sampled from 33 sites around the Japanese archipelago via genotyping by random amplicon sequencing-direct (GRAS-Di) analysis. Our analyses revealed three major genetic groups: Japanese Broad Range, Northernmost Honshu-Hokkaido (NHH), and Western Sea of Japan groups. These groups exhibited significant genetic differentiation (global FST = 0.056), distinct levels of nucleotide diversity, and group-specific genome-wide patterns of Tajimas D. Moreover, demographic history reconstruction based on whole-genome sequencing of three representative individuals revealed that each genetic group followed distinct demographic trajectories since the last glacial period. Importantly, the NHH group, related to the Mutsu Bay spawning aggregation and previously shown to exhibit strong natal homing in tagging surveys, was genetically identified for the first time in this study. Isolation-by-distance was observed across Japanese waters and within the Japanese Broad Range group, but not within the NHH group, suggesting that gene flow is generally restricted by geographic distance, except within the NHH group. To evaluate the potential for genetic stock identification, we extended a resampling-based cross-validation framework by incorporating outlier detection to assess marker selection strategies. Over 500 background SNPs were required to achieve >90% assignment accuracy for genetic stock identification, whereas only eight or more outlier SNPs showed comparable performance. These findings suggest that carefully selected SNP panels, particularly those including outlier loci, substantially improve stock discrimination. Overall, our study demonstrates the fine-scale genetic structure and demographic history of Pacific cod in Japanese waters and highlights the utility of practical marker strategies for enhancing the biological realism of fisheries assessment and supporting sustainable fisheries management.

14
Increasing Phenomic Prediction Efficiency Using A Principal Component Analysis Based Pre-Processing Of Near Infrared Spectra

Bienvenu, C.; Roger, J.-M.; Sene, M.; Castro Pacheco, S. A.; Singer, M.; Felaniaina, B. L.; Terrier, N.; De Bellis, F.; Pot, D.; DE VERDAL, H.; Segura, V.

2026-05-13 genetics 10.64898/2026.05.10.724118 medRxiv
Top 0.1%
0.9%
Show abstract

Phenomic prediction (PP) is a breeding value prediction method using near infrared spectroscopy (NIRS). Spectra pre-processing is a key step in the analysis pipeline of PP and generally involves chemometrics methods. However, there is still little understanding in the genetics community of what pre-processing does and why it increases performances. Consequently, the choice of pre-processing is done either arbitrarily or through a search of the optimal set of methods and associated parameters. In this study, we propose a PCA-based pre-processing method where genetic values of spectra are estimated on a set of principal components instead of individual wavelengths. This way, estimations are based on a few informative and orthogonal features of spectra instead of many correlated, uninformative wavelengths. We tested this new pre-processing method on five data sets representing four plant species (maize, rice, sorghum and grapevine). Results show that it performs as good, or better than the best classical chemometric pre-processing methods in almost all cases. Combining PCA-based and classical chemometric pre-processing methods maximizes predictive ability. Moreover, this pre-processing method opens up possibilities of better understanding and selecting parts of the spectral information that are relevant for the prediction of breeding values. Indeed, components representing together about 1% of spectral variability were found to be responsible for most of PP predictive ability. Plain language summaryCultivated plants are the result of a breeding process during which their genetic values are used to select those to breed. Estimation of breeding values requires heavy experimental means and is time consuming. Phenomic prediction is a low cost and high throughput genetic value estimation method that is increasingly being used. It often uses near infrared spectroscopy measurements as predictors of genetic values that are easy to collect and thus routinely used in many species. However, near infrared spectra generally require pre-processing before being used in prediction. Currently used pre-processing methods arise from the chemometrics community, and still deserve a better in-depth appropriation by geneticists. In this study, we propose a new pre-processing approach that performs as good as or better than the best chemometric pre-processing generally used, reduces computation time, and allows for a better understanding of what parts of spectral information are relevant for prediction. Core IdeasO_LIWorking on principal components of spectra instead of wavelengths increases predictive ability of phenomic prediction and performs as good as or better than classical chemometrics pre-processing C_LIO_LIWorking on principal components of spectra requires less optimization of parameters than chemometrics pre-processing C_LIO_LIAbout 1% of spectral variance is responsible for most of the predictive power of phenomic prediction C_LIO_LIWorking on principal components of spectra pre-processed with classical chemometrics pre-processing can increase predictive ability even more C_LIO_LIPCA-based methods are valuable to optimize predictive ability of phenomic prediction and could be used more widely in the quantitative genetics field C_LI

15
Joint modeling of social genetic effects in mono- and pluri-specific groups: case study in intercrops

Salomon, J.; Enjalbert, J.; Flutre, T.

2026-03-31 genetics 10.64898/2026.03.27.714849 medRxiv
Top 0.1%
0.9%
Show abstract

The genetics of interspecific groups remains largely unexplored, despite the central role of social (or indirect) genetic effects in shaping phenotypic expression within communities. Intercropping, i.e. the simultaneous cultivation of multiple crop species in the same field, offers a powerful model to harness these interspecific social effects. Such species mixtures provide well-documented agricultural benefits, yet few breeding frameworks have integrated the genetics of social interactions. Here, we address this gap by extending quantitative genetic theory to interspecific groups, with intercropping as a concrete and applied model case. We propose a quantitative genetic model that jointly analyzes intra and interspecific interactions within a unifying framework. Breeding values are decomposed into a direct component, shared in mono and mixed-crops, an interspecific social component corresponding to the effect of one species on another, and an intraspecific component that captures the social effects within a mono-genotypic stand of cloned plants. Statistically, this consists in simultaneously fitting several linear mixed models, one per stand type, all having direct breeding values in common. As no open-source software can fit such a complex mixed model, we provide such an implementation in R/C++. Simulations across various genetic (co)variance structures and sparse experimental designs showed accurate estimation of all genetic (co)variances and breeding values. With an incomplete, yet balanced design combining sole crops and intercrops, genetic gains in both systems were achievable simultaneously, enabling breeding strategies that progressively integrate intercropping into existing, sole-crop-only schemes. More broadly, this framework allows dissecting direct and social genetic effects when genotypes are observed in mono- and mixed-species situations, cultivated or not.

16
Heat Stress Induces Locus-Specific DNA Hypomethylation Linked to Immune Regulation in Lactating Holstein Cows

Costa Monteiro Moreira, G.; Ruiz Gonzalez, A.; Joigner, M.; Costes, V.; Chaulot-Talmon, A.; Ali, F.; Bourgeois-Brunel, L.; Jammes, H.; Rico, D. E.

2026-03-26 genomics 10.64898/2026.03.23.713208 medRxiv
Top 0.2%
0.9%
Show abstract

Epigenetics may play a crucial role in livestock adaptation to environmental challenges like heat stress. In recent years, a growing number of studies have investigated the epigenetic mechanisms underlying dairy cow adaptation to heat stress. However, there is still limited knowledge about the effects of heat stress on immune cells and immune-related phenotypes. Herein we aim to identify heat-stress induced DNA methylation variations on blood methylome potentially affecting regulatory regions and associated phenotypes. Blood samples were collected and peripheral blood mononuclear cell (PBMC) isolated from four cows before (D0) and after (D14) a 14-d heat stress challenge (cyclical THI 72-82) and, from four cows kept in thermoneutral conditions (THI 61-64). Heat-stressed cows had ad libitum access to diets supplemented with adequate levels of vitamin D and Ca (12,000 IU/kg of vitamin D and 0.73% Ca, respectively). To eliminate confounding effects due to differences in nutrient intake, cows maintained under thermoneutral conditions were pair-fed (PF) to their heat-stressed counterparts and received adequate concentrations of vitamin D and Ca as well. Reduced representation bisulphite sequencing (RRBS) was used to profile PBMCs methylome. Differential methylation analysis was performed using methylKit and DSS softwares ({Delta}meth [&ge;] 25%, adjusted p-value < 0.01), retaining only commonly detected differentially methylated cytosines (DMCs). A total of 2,908 DMCs were identified when comparing pre- and post-heat stress samples. After excluding 649 DMCs that were also detected under thermoneutral conditions, as these changes were likely associated with feed restriction inherent to the pair-feeding design rather than with heat stress per se, 2,259 heat stress-specific DMCs remained, predominantly hypomethylated. About half of the DMCs are annotated in intronic and intergenic regions; known to harbor regulatory elements. By intersecting the DMRs with publicly available functional annotation data, we observed hypomethylation on regulatory regions putatively affecting cows immune system. As an example, we identified a loss of methylation within an enhancer region of the MSN gene, which is involved in lymphocyte homeostasis, and a loss of methylation in the promoter region of MECP2, a well-established epigenetic regulator with a central role in chromatin organization and gene expression. These findings highlight the impact of heat stress on dairy cow immunity and provide new insights into its epigenetic regulation under environmental stress. Interpretative summaryThis study examined DNA methylation changes induced by heat stress in dairy cows to elucidate epigenetic mechanisms of thermal adaptation. Using RRBS on PBMCs, 2,259 heat stress-specific differentially methylated cytosines were identified, predominantly hypomethylated and enriched in regulatory regions. Functional annotation highlighted immune-related pathways, including hypomethylated regulatory regions near genes (e.g., MSN, ZBTB33, SLC25A5, GNAS, FAM3A, and MECP2) associated with immune function. These findings indicate that heat stress induces targeted epigenetic modifications potentially affecting immune regulation in dairy cows.

17
Genomic epidemiology of the 2017-2023 outbreak of Mycoplasma bovis sequence type ST21 in New Zealand

French, N. P.; Burroughs, A.; Binney, B.; Bloomfield, S.; Firestone, S. M.; Foxwell, J.; Gias, E.; Sawford, K.; van Andel, M.; Welch, D.; Biggs, P. J.

2026-04-10 genomics 10.64898/2026.04.07.717125 medRxiv
Top 0.2%
0.8%
Show abstract

Mycoplasma bovis was first detected in cattle in New Zealand in 2017, prompting an eradication programme that incorporated extensive surveillance and a test-and-cull policy. Genome sequence data and phylodynamic models were used to inform decision making throughout the eradication programme. Isolates from 697 cattle on 126 farms were collected and sequenced between July 2017 and December 2023. Phylodynamic models were used to estimate the time of most recent common ancestor, the effective reproduction number (Reff) and effective population size, and long-range and local between-farm transmission dynamics. The analysis revealed the dramatic impact of movement restrictions and culling up to early 2020, with a sharp reduction in the Reff to less than 1 in 2018/9 and the extinction of two of three major lineages in 2020. This was followed by three-years of residual infection in farms in the South Island, associated with persistent infection of a large feedlot farm and nearby farms. The comprehensive dataset of genomic and epidemiological data provided a unique opportunity to study the dynamics of a country-wide outbreak of a single-host pathogen from first detection to potential eradication, underlining the utility of integrated genomic surveillance during an outbreak response. Author summaryThe economically important cattle pathogen, Mycoplasma bovis, was first detected in New Zealand in 2017. This led to a large-scale, successful control programme aimed at eradication of the pathogen. The decision to undertake an eradication programme was informed by initial analyses of whole genome sequences from isolates collected as part of the surveillance programme. The analysis showed that the bacteria had entered New Zealand relatively recently and was unlikely to be widespread. Over the subsequent years, genome sequencing and modelling of transmission dynamics informed important policy decisions made by the New Zealand Government and the cattle industry, and helped to monitor progress of the eradication programme. The impact of the detection, movement control and culling programme was profound, with sharp reductions in transmission between 2018 and 2020. This was followed by a long tail of localised infection in the South Island, involving transmission from a large feedlot farm. Provisional eradication was achieved after depopulation of this feedlot. This analysis highlights the role of genomic surveillance and modelling to inform decision making during an infectious disease outbreak.

18
Genetic Characterization of the TAPBP and Its Haplotypic Association with BF2 in the Chicken Major Histocompatibility Complex

Fernando, R.; Agulto, T. N.; Cho, E.; Kim, J.; van Hateren, A.; Kim, M.; Prabuddha, M.; Lee, J. H.

2026-04-23 genetics 10.64898/2026.04.20.719781 medRxiv
Top 0.2%
0.7%
Show abstract

TAPBP is a key chaperone of the peptide-loading complex that facilitates peptide loading onto major histocompatibility complex class I (MHC I) molecules. This study characterized TAPBP alleles in Korean Native Chickens (KNCs), identified novel variants, and evaluated haplotypic associations with BF2. Thirty-six samples representing six KNC lines were genotyped using LEI0258 and the MHC-B SNP panel, and individuals homozygous at both markers were classified into 16 groups. The same samples were subjected to Sanger sequencing of TAPBP exons 3-8. Sequences were assembled and aligned against MHC-B reference haplotypes and the Red Junglefowl reference. Additional comparisons with "tapasin allele" datasets enabled the identification of novel variants. Six novel nucleotide variants were detected across exons 3-6, including one nonsynonymous substitution in exon 4 (D251H). This residue corresponds to position Q265 in human TAPBP and lies adjacent to residues involved in MHC I interaction, suggesting potential functional relevance. Furthermore, TAPBP exhibited high haplotype diversity (Hd = 0.93) and moderate nucleotide diversity ({pi} = 0.00892), with exon 5 showing the highest diversity ({pi} = 0.01). B9 was the most frequent haplotype at the nucleotide level, whereas B6/B24 predominated at the amino acid level. Comparison with BF2 data revealed haplotype-dependent pairing patterns: BF2-B9 consistently matched TAPBP-B9, whereas BF2-B6 was associated with distinct TAPBP nucleotide variants, indicating allelic diversification within a shared haplotypic background. Homozygosity at LEI0258 and the SNP panel corresponded with TAPBP homozygosity, supporting marker-based prediction. These findings highlight potential BF2-TAPBP associations and provide a foundation for understanding variation in MHC I peptide loading.

19
Optimizing resource allocation in Miscanthus breeding with sparse testing designs for genomic prediction

Proma, S.; Lubanga, N.; Sacks, E.; Leakey, A. D. B.; Zhao, H.; Ghimire, B. K.; Lipka, A. E.; Njuguna, J. N.; Yu, C. Y.; Seong, E. S.; Yoo, J. H.; Nagano, H.; Anzoua, K. G.; Yamada, T.; Chebukin, P.; Jin, X.; Clark, L. V.; Petersen, K. K.; Peng, J.; Sabitov, A.; Dzyubenko, E.; Dzyubenko, N.; Glowacka, K.; Nascimento, M.; Campana Nascimento, A. C.; Dwiyanti, M. S.; Bagment, L.; Shaik, A.; Garcia-Abadillo, J.; Jarquin, D.

2026-03-23 genomics 10.64898/2026.03.18.712722 medRxiv
Top 0.2%
0.7%
Show abstract

Phenotyping high-biomass perennial crops is laborious and the rate of genetic gain in perennial crop breeding programs is typically low. So, it is especially important to identify methods that produce efficiency gains in the breeding process. Miscanthus is a C4 perennial grass with favorable characteristics for producing biomass as a feedstock for biofuels and diverse biobased products. Increasing biomass yield will increase profitability and environmental benefits, so is a key target for Miscanthus breeding. In addition, the identification of well-adapted genotypes across a wide range of environmental conditions requires the establishment of multi-environment trials (METs). Sparse testing is a genomic prediction-based strategy that reduces the phenotyping costs in METs by selecting a subset of genotypes to evaluate in a subset of environments and then predicts the performance of the unobserved genotype-environment combinations. A Miscanthus sacchariflorus (MSA) population comprising 336 genotypes observed across three environments was analyzed. Three prediction models considering main effects (environments, genotypes, genomic) and interaction effects (genotype-by-environment; GxE interaction) were implemented for forecasting dry biomass yield (YDY), total culm (TCM), average internode length (AIL), and culm node number (CNN). Multiple calibration sets based on different compositions and sizes were considered to evaluate performance in terms of the predictive ability (PA) and the mean square error (MSE) for a fixed testing set size. The training set size ranged from 52 to 112 to predict a fixed set of 224 unobserved genotypes across all three environments. The results showed that the model accounting for GxE interaction presented the highest PA and the lowest MSE for CNN (PA: [~]0.77, MSE: [~]0.5) and YDY (PA: [~]0.70, MSE: [~]1.3) while for TCM and AIL these ranged from [~]0.28 to 0.41 and [~]1.3 to 4.3, respectively. Overall, varying training sets and allocation strategies did not affect PA and MSE, with 52 non-overlapping and 0 overlapping genotypes per environment as the optimal cost-effective allocation framework. This suggests that implementing sparse testing designs could significantly reduce phenotyping costs by fivefold, without compromising PA in breeding programs for perennial crops such as Miscanthus.

20
Progeny differentiation in faba bean using hyperspectral images and machine learning

Schlichtermann, R.-H.; Warnemuende, S.; Tietgen, H.; Welna, G.; Stahl, A.; Wittkop, B.; Snowdon, R.

2026-05-21 genetics 10.64898/2026.05.19.725957 medRxiv
Top 0.2%
0.7%
Show abstract

Though currently a minor crop, faba bean is a promising source of plant-based protein as global diets shift towards more plant-based nutrition. To realise this potential, advances in breeding and cultivation are crucial. To exploit heterosis, faba bean breeding frequently utilises synthetic cultivars, which involves open pollination of inbred lines to produce a mixture of F1 hybrid seeds and self-pollinated offspring. Pure F1 hybrid cultivars are currently unavailable due to unstable cytoplasmic male sterility (CMS) systems. An ability to distinguish F1 seeds from their parental inbreds via characteristics associated with xenia effects could change this. The xenia effect refers to the influence of paternal pollen on seed traits, for example seed weight and cotyledon cells in faba bean. In this study, we exploited the xenia effect captured in hyperspectral imaging data to develop machine learning scenarios for discriminating between parental and F1 seeds of open pollinated synthetic combinations (Syn-1). The hyperspectral data were pre-processed using Savitzky-Golay filtering to reduce noise and smooth the spectra. Various machine learning algorithms were applied, incorporating Bayesian hyperparameter optimisation. The scenarios achieved up to 98.9 % accuracy in separating parental components of Syn-1. When including all seeds, the model achieved 40.7 %, indicating moderate detection and classification performance. As the harmonic mean of precision and recall, the F1 score accounts for both the correctness of F1 seed detections and the completeness with which F1 seeds were detected. While this approach does not yet enable the development of full hybrid cultivars, it paves the way for hybrid-enriched cultivars. These could help to streamline breeding for synthetic cultivars and potentially increase yields, for example by increasing the proportion of F1 hybrid seeds in synthetic cultivars. This study extends knowledge of the xenia effect in faba bean and provides a basis for further research aimed at enhancing breeding methods and productivity.